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Abstract 

We  consider  the  problem  of  storing  and  searching  a  large  state  space  obtained  from  a  high- 
level  model  such  as  a  queueing  network  or  a  Petri  net.  After  reviewing  the  traditional 
technique  based  on  a  single  search  tree,  we  demonstrate  how  an  approach  based  on  multiple 
levels  of  search  trees  offers  advantages  in  both  memory  and  execution  complexity,  and  how 
solution  algorithms  based  on  Kronecker  operators  greatly  benefit  from  these  results.  Further 
execution  time  improvements  are  obtained  by  exploiting  the  concept  of  “event  locality.”  We 
apply  our  technique  to  three  large  parametric  models,  and  give  detailed  experimental  results. 
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1  Introduction 

Extremely  complex  systems  are  increasingly  common.  Various  types  of  high-level  models 
are  used  to  describe  them,  study  them,  and  forecast  the  effect  of  possible  modifications. 
Unfortunately,  the  logical  and  dynamic  analysis  of  these  models  is  often  hampered  by  the 
combinatorial  explosion  of  their  state  spaces,  a  problem  inherent  with  discrete-state  systems. 
Possible  solutions  are: 

•  Use  approximate  or  bounding  analysis  methods.  These  are  particularly  effective  when 
studying  the  performance  or  reliability  of  a  system  described  by  a  stochastic  model 
(such  as  a  queueing  network  or  a  stochastic  Petri  net). 

•  Use  discrete-event  simulation.  This  does  not  require  the  generation  and  storing  of  the 
entire  state  space.  However,  it  is  a  Montecarlo  method,  and  hence  it  can  only  state  that 
a  certain  condition  is  (or  is  not)  encountered  with  a  certain  probability.  For  example,  if 
the  simulation  runs  performed  during  an  experiment  don’t  find  a  deadlock,  we  cannot 
conclude  with  certainty  that  the  system  is  deadlock-free. 

•  Use  exact  methods  that  cope  with  large  state  spaces.  These  include  the  use  of  bet¬ 
ter  algorithms  and  data-structures  and  of  distributed  approaches  exploiting  multiple 
workstations,  plus,  of  course,  the  trivial  (but  expensive)  approach  of  increasing  the 
amount  of  RAM. 

We  stress  that  the  above  approaches  are  not  mutually  exclusive.  Indeed,  it  is  quite 
common  to  use  exact  methods  in  the  early  stages  of  a  system’s  design,  when  studying 
rough  models  over  a  wide  range  of  parameter  combinations,  and  then  focus  on  more  detailed 
models  using  approximate  methods  or  simulation.  Indeed,  many  approximate  decomposition 
approaches  use  exact  methods  at  the  submodel  level,  thus  introducing  errors  only  when 
exchanging  or  combining  (exact)  results  from  different  submodels.  Then,  the  ability  of 
applying  exact  methods  to  larger  models  will  normally  result  in  better  approximations:  it 
is  likely  that  the  study  of  a  large  model  decomposed  into  four  medium-size  submodels,  each 
solved  with  exact  methods,  will  be  more  accurate  than  that  of  the  same  model  decomposed 
into  ten  small-size  submodels. 

The  first  problem  when  applying  exact  methods  is  the  generation  and  storage  of  the  state 
space.  For  performance  or  reliability  analysis,  the  model  is  then  used  to  generate  a  stochastic 
process,  often  a  continuous-time  Markov  chain  (CTMC),  which  is  solved  numerically  for 
steady-state  or  transient  measures.  The  state  space  itself  can  be  of  interest  since  it  can  be 
used  to  answer  questions  such  as  existence  of  deadlocks  and  livelocks  or  liveness. 

In  this  paper,  we  focus  on  techniques  to  store  and  search  the  state  space.  However,  while 
we  do  not  discuss  the  solution  of  the  stochastic  process  explicitly,  our  results  apply  not  only 
to  the  “logical”  analysis  of  the  model,  but  even  more  so  to  its  “stochastic  analysis”  (indeed, 
this  was  our  initial  motivation).  This  is  particularly  true  given  the  recent  developments  on 
the  solution  of  complex  Markov  models  using  Kronecker  operators  [2,  3,  9,  12,  13,  15,  17] 
which  do  not  require  the  storage  of  the  infinitesimal  generator  of  the  CTMC  explicitly.  In 
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this  case,  the  storage  bottleneck  is  indeed  due  to  the  state  space  and  the  probability  vectors 
allocated  when  computing  the  numerical  solution. 

Sections  2,  3,  and  4  define  the  type  of  high-level  formalisms  we  use,  discuss  the  reachability 
set  and  its  storage,  and  motivate  the  decomposition  of  a  model  into  submodels,  respectively. 
Our  main  contributions  are  in  Sections  5  and  6,  where  we  introduce  a  multilevel  data  struc¬ 
ture  and  show  how  it  can  be  used  to  save  both  memory  and  execution  time  when  building 
the  reachability  set.  Section  7  further  explores  storing  and  searching  the  reachability  set 
after  it  has  been  built.  Finally,  Section  8  contains  our  final  remarks. 


2  High-level  model  description 

We  implemented  our  techniques  in  SMART  [7],  using  the  GSPN  formalism  [1,  5].  However, 
these  techniques  can  be  applied  to  any  model  expressed  in  a  “high-level  formalism”  which 
defines: 

•  The  “potential  set” ,  This  is  a  discrete  set,  assumed  finite,  to  which  the  states  of 
the  model  must  belong^. 

•  A  finite  set  of  possible  events,  S. 

•  An  initial  state,  €  'R. 

•  A  boolean  function  defining  whether  an  event  is  active  (can  occur)  in  a  state.  Active  : 
£  xiZ  {True,  False}. 

•  A  function  defining  the  effect  of  the  occurrence  of  an  active  event  on  a  state:  New  : 

s  X  iz  — y 

Fig.  1  and  2  show  the  two  GSPNs  used  throughout  this  paper  to  illustrate  the  effect  of 
our  techniques  (ignore  for  the  moment  the  dashed  boxes).  The  first  GSPN  models  a  kanban 
systems,  from  [9,  17].  It  is  composed  of  four  instances  of  essentially  the  same  sub-GSPN. 
The  synchronizing  transitions  tsynchi  and  tsynch2  can  be  either  both  timed  or  both  immediate, 
we  indicate  the  two  resulting  models  as  kanban-timed  and  kanban-immediate.  The  second 
GSPN  models  a  flexible  manufacturing  system,  from  [10],  except  that  the  cardinality  of  all 
arcs  is  constant,  unlike  the  original  model  (this  does  not  affect  the  number  of  reachable 
markings).  We  indicate  this  model  as  FMS.  We  do  not  describe  these  models  in  more  detail, 
the  interested  reader  is  referred  to  the  original  publications  where  they  appeared. 


lA  point  about  notation:  we  denote  sets  by  upper  case  calligraphic  letters.  Lower  and  upper  case  bold 
letters  denote  vector  and  matrices,  respectively.  77(A)  is  the  number  of  nonzero  entries  m  a  matrix 
is  the  entry  in  row  i  and  column  j  of  A;  Ax,y  is  the  submatrix  of  A  corresponding  to  the  set  of  rows  X  and 

the  set  of  columns  3^. 
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Figure  1:  The  GSPN  of  a  Kanban  system. 

3  The  reachability  set 

From  the  high-level  description,  we  can  build  the  “reachability  set”,  %  CTi.  This  is  the 
set  of  states  that  can  be  reached  from  through  the  occurrence  of  any  sequence  of  enabled 


Figure  2:  The  GSPN  of  a  FMS  system. 
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BuildRS{\n:  Events,  Active,  New;  out;  1t); 

I  7^  ^  0;  /*  %:  states  explored  so  far  */ 

2.  {i°}:  /*U:  states  found  but  not  yet  explored  */ 

3.  while  ZY  ^  0  do 

4.  i  -f-  ChooseRemove{lA); 

5.  TZ.  ■«- 7^U{i}; 

6.  for  each  e  6  5  s.t.  Active{e,  i)  =  True  do 

7.  j  •«-  New{e,  i); 

8.  Searchlnsert{i ,  7lU  ZY,  ZY) ; _ _ _ _ 

Figure  3:  Procedure  BuildRS 

events.  Formally,  %  is  the  smallest  subset  of  H  satisfying: 

•  i°  €  7^,  and 

•  i  e  71 A  3e  €  5,  Actwe{e,  i)  =  True  A  i'  =  New[e,  i)  ^  i'  €  %. 

While  a  high-level  model  usually  specifies  other  information  as  well  (the  stochastic  timing 
of  the  events,  the  measures  that  should  be  computed  when  solving  the  model,  etc.),  the  above 
description  is  sufficient  for  now,  and  it  is  not  tied  to  any  particular  formalism. 

n  can  be  generated  using  the  state-space  exploration  procedure  BuildRS  shown  in  Fig.  3, 
which  terminates  if  7e  is  finite.  This  is  a  search  of  the  graph  implicitly  defined  by  the  model. 
Function  ChooseRemove  chooses  an  element  from  its  argument  (a  set  of  states),  removes  it, 
and  returns  it.  Procedure  Searchinsert  searches  the  first  argument  (a  state),  in  the  second 
argument  (a  set  of  states)  and,  if  not  found,  it  inserts  the  state  in  its  third  argument  (also 

a  set  of  states). 

The  “reachability  graph”,  {11,  A),  has  as  nodes  the  reachable  states,  and  an  arc  from  i 
to  j  iff  there  is  an  event  e  such  that  Actwe{e,i)  =  True  and  New{e,i)  =  j.  Depending  on 
the  type  of  analysis,  the  arc  might  be  labelled  with  e;  this  might  result  in  a  multigraph,  if 
multiple  events  can  cause  the  same  change  of  state.  While  we  focus  on  11  alone,  the  size  of 
A  affects  the  complexity  of  BuildRS. 

A  total  order  can  be  defined  over  the  elements  of  7^:  i  <  j  iff  i  precedes  j  in  lexical  order. 
We  can  then  define  a  function  Compare  :  {Ux  11)  {-1,0,1}  returning  -1  if  the  first 

argument  precedes  the  second  one,  1  if  it  follows  it,  and  0  if  the  two  arguments  are  identical. 
This  order  allows  the  efficient  search  and  insertion  of  a  state  during  BuildRS.  Once  11  has 
been  built,  we  can  define  a  bijection  'F  :  71  -)•  (0, . . . ,  [TT-l  -  1},  such  that  'F(i)  <  ^(j)  iff 

Common  techniques  to  store  and  search  the  sets  77.  and  ZY  include  hashing  and  search 
trees.  Hashing  would  work  reasonably  well  if  we  had  a  good  bound  on  the  final  size  of 
the  reachability  set  77,  but  this  is  not  usually  the  case.  We  prefer  search  trees:  when  kept 
balanced,  they  have  more  predictable  behavior  than  hashing. 
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We  consider  two  alternatives,  both  based  on  binary  search  trees:  splay  trees  [11]  and 
AVL  trees  [14].  The  execution  complexity^  of  BuildRS  when  using  either  method  is 

O  (1^1  ■  1^1  ■  C Active  "t"  1*^1  '  {^New  d"  lo§  |^|  '  ^Compare')^  > 

where  Cx  is  the  (average)  cost  to  a  call  to  procedure  x.  For  AVL  trees,  the  storage  complexity, 
expressed  in  bits,  is 

0(|K|(S  state  d"  ‘^Bpointer  d"  2))  , 

where  Estate  is  the  (average)  number  of  bits  to  store  a  state,  Bpointer  is  the  number  of  bits  for 
a  pointer.  The  additional  two  bits  are  used  to  store  the  node  balance  information.  For  splay 
trees,  the  complexity  is  the  same,  except  for  the  two-bit  balance,  which  is  not  required. 

Regarding  the  use  of  binary  search  trees  in  BuildRS,  we  have  another  choice  to  make. 
We  can  use  a  single  tree  to  store  TlUU  in  BuildRS  as  shown  in  Fig.  4(a,b,c),  or  two  separate 
trees,  as  in  Fig.  4(d).  Using  a  single  tree  has  some  advantages: 

•  No  deletion,  possibly  requiring  a  rebalancing,  is  needed  when  moving  a  state  from  U 
to  7?.. 

•  Procedure  Searchinsert  needs  to  perform  a  single  search  with  log(|7lUZY|)  comparisons, 
instead  of  two  searches  in  the  two  trees  for  R.  and  U,  which  overall  require  more 
comparisons  (up  to  twice  as  many) ,  since 

log  \n\  +  log  \U\  =  log(l7^|  •  \U\)  >  log(|7^|  -h  \U\)  =  log(|7^  U  U\) 

(of  course,  we  mean  “the  current  R  and  W  in  the  above  expressions). 

However,  it  also  has  disadvantages.  As  explored  and  unexplored  states  are  stored  together 
in  a  single  tree,  there  must  be  a  way  to  identify  and  access  the  nodes  ofU.  We  can  accomplish 
this  by: 

•  Maintaining  a  separate  linked  list  pointing  to  the  unexplored  states.  This  requires  an 
additional  0(2 |W|  •  Bpointer)  bits,  as  shown  in  Fig.  4(a). 

•  Storing  an  additional  pointer  in  each  tree  node,  pointing  to  the  next  unexplored  state. 
This  requires  an  additional  0{\R\-  Bpointer)  bits,  as  shown  in  Fig.  4(b). 

•  Storing  the  markings  in  a  dynamic  array  structure.  In  this  case,  the  nodes  of  the 
tree  contain  indices  to  the  array  of  markings,  instead  of  the  markings  themselves.  If 
markings  are  added  to  the  array  in  the  order  they  are  discovered,  R  occupies  the 
beginning  of  the  array,  while  U  occupies  the  end.  This  requires  an  additional  0(|7^|  • 
Bindex)  bits,  where  Bindex  is  the  number  of  bits  required  to  index  the  array  of  markings, 
as  shown  in  Fig.  4(c) 

The  size  of  R  for  our  three  models  is  shown  in  Table  1,  as  a  function  of  N.  Since 
the  models  are  GSPNs,  they  give  rise  to  “vanishing  markings”,  that  is,  markings  enabling 
only  immediate  transitions,  in  which  the  sojourn  time  is  zero.  In  all  our  experiments,  these 
markings  are  eliminated  “on  the  fly”  [8],  so  that  all  our  figures  regarding  R  reflect  only  the 
“tangible”  reachable  markings. 

^We  leave  known  constants  inside  the  big-0  notation  to  stress  that  they  do  make  a  difference  in  practice, 
even  if  they  are  formally  redundant. 
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Figure  4;  Four  simple  storage  schemes  for  7L  and  U. 


N 

\1Z  for  kanban-timed 

|77.  for  kanban-immediate 

|7^|  for  FMS 

1 

160 

152 

54 

2 

4,600 

3,816 

810 

3 

58,400 

41,000 

6,520 

4 

454,475 

268,475 

35,910 

5 

2,546,432 

1,270,962 

152,712 

6 

11,261,376 

4,785,536 

537,768 

7 

- 

- 

1,639,440 

Table  1:  Reachability  set  size  for  the  three  models,  as  a  function  of  N. 


4  Structured  state  spaces 


We  now  assume  that  K  can  be  expressed  as  the  Cartesian  product  of  K  “local”  state  spaces: 
ii  =  'Rpx---x  and  that  is  simply  {0, ...  n*  - 1},  although  Uk  might  not  be  known 

in  advance.  Hence,  a  (global)  state  is  a  vector  i  G  {0, . . .  no  - 1}  x  •  •  •  x  {0, . . .  tik-i  - 1}.  For 
example,  if  the  high-level  model  is  a  single-class  queuing  network,  ij,  could  be  the  number 
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of  customers  in  queue  k.  However,  a  realistic  model  can  easily  have  dozens  of  queues;  so 
it  might  be  more  efficient  to  partition  the  queues  into  K  sets,  and  let  be  the  number  of 
possible  combinations  of  customers  into  the  queues  of  partition  k.  An  analogous  discussion 
applies  to  stochastic  Petri  nets  (SPNs),  and  even  to  more  complex  models,  as  long  as  they 
have  a  finite  state  space. 

Such  a  structured  model  definition  is  essential  to  the  Kronecker  approach  for  the  spec¬ 
ification  and  solution  of  complex  Markov  models  recently  proposed  by  various  authors 
[2,  3,  12,  13,  15].  The  main  idea  is  that  the  infinitesimal  generator  Q  of  a  large  model 
composed  of  K  submodels  can  be  described  as  (a  submatrix  of)  a  sum  of  Kronecker  prod¬ 
ucts  of  K  smaller  matrices,  each  related  to  a  difierent  submodel.  Without  going  into  further 
details,  we  simply  list  a  few  key  features  of  this  approach,  which  prompted  much  of  our  work. 

•  The  potential  set  'R.  is  the  Cartesian  product  of  K  local  state  spaces,  one  for  each 

submodel.  Our  lexical  order  can  be  applied  just  as  well  to  7t,  hence  we  can  define  a 
bijection  {0, . . . ,  —  1},  analogous  to  Indeed,  ^  has  a  fundamental 

advantage  over  its  value  equals  its  argument  interpreted  as  a  mixed  base  integer, 

K-l 

1^(1)  =  (. . .  ((io)ni  +  ii)n2  •  •  ■)nK-i  +  h-i  =  1]  U  • 

k=Q 

where  nj  =  nLi 

•  The  reachability  set  TZ  may  coincide  with  K,  but  it  is  more  likely  to  be  much  smaller, 
since  many  combinations  of  local  states  might  not  occur. 

•  Numerical  solutions  compute  the  steady-state  probabilities  of  each  state  using  either 
a  vector  #  of  size  |'^|  or  a  vector  tt  of  size  [T^].  In  the  former  case,  the  probability 
of  state  i  G  72.  is  stored  in  while  the  entries  of  #  corresponding  to  unreachable 
states  are  not  used,  they  are  initially  set  to  zero  and  never  modified  [12].  In  the 
latter  case,  the  same  probability  is  stored  in  7r^(i)  [13].  We  prefer  this  second  approach 
because  it  requires  potentially  much  less  memory.  However,  as  described  by  Kemper,  its 
complexity  has  an  additional  0(log  ]72|)  multiplicative  factor,  since,  for  each  reachable 
state  i  G  72  and  for  each  possible  transition  from  i  to  j,  the  index  ^(j)  of  j  in  tt  must 
be  computed  using  a  binary  search. 

•  Any  steady-state  or  transient  solution  method  that  avoids  storage  and  execution  com¬ 
plexity  of  order  \K\  must  iterate  over  the  elements  of  72  in  some  order.  The  lexical 
order  is  acceptable,  especially  for  methods  such  as  Power,  Jacobi,  or  Uniformization, 
which  are  not  affected  by  the  order  in  which  the  variables  are  considered.  To  clarify  the 
relevant  complexity  issues,  we  show  procedure  VectMatrMultiply  in  Fig.  5,  to  perform 
the  assignment 

y  f-  y  -t-x  •  (a°  (g)  (g)  •  •  •  (g) 

where  is  a  matrix  of  order  Uk  and  the  subscript  “72, 72”  indicates  the  submatrix  of 
the  Kronecker  product  corresponding  to  the  reachable  states  only.  This  is  the  main 
operation  needed  when  solving  for  the  steady  state  or  transient  probabilities  for  any 
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VectMatrMultiplyi^m:  x,  K,  71,  A°,AS...,A^  ^  inout:  y) 

1. 

for  each  i  €  7?.  do 

2. 

/  ^ 

3. 

for  each  jo  s.t.  A?, >  0  do 

4. 

oo  A? 

5. 

for  each  ji  s.t.  A|j  >  0  do 

6. 

Cl  4-  uo  • 

7. 

for  each  jk-l  s.t.  >  0  do 

8. 

uk-i  •<-  aK-2  • 

9. 

J  ^  ^(j): 

10. 

if  J  /  null  then 

11. 

yj  yj  +  X/  ■  o-K-i. 

Figure  5:  Procedure  to  multiply  a  vector  by  a  submatrix  of  a  Kronecker  product. 


approach  based  on  Kronecker  algebra  over  data  structures  of  size  0{\R\).  For  example, 
with  the  Power  method,  x  is  the  old  iterate  for  tt  while  y  accumulates  what  will  be 
the  new  iterate.  We  stress  that  the  procedure  VectMatrMultiply  relies  heavily  on  the 
extreme  sparsity  of  the  matrices  involved  (assumed  to  be  stored  in  sparse  row-wise 
format),  while  algorithms  such  as  those  proposed  by  Plateau  and  Stewart  [16]  are 
more  appropriate  for  full  matrices. 

In  the  case  of  superposed  GSPNs  [12],  the  test  J  7^  null  of  statement  10  is  always 
satisfied,  hence,  it  should  be  apparent  that  procedure  VectMatrMultiply  considers  all 
and  only  the  nonzero  entries  in  the  submatrix  of  A.  The  overall  complexity  is 
then  affected  by  the  log  \Tl\  overhead  to  compute  the  index  J  of  j  in  1Z,  in  statement 
9,  within  the  innermost  for-loop: 

O  •  log  I'^l)  • 

Hence,  two  main  limitations  in  [13]  prevent  the  application  of  this  approach  to  truly 
enormous  problems.  First,  the  storage  of  the  state  space  71  and  of  the  iteration  vectors,  also 
of  size  \7l\  might  require  excessive  memory.  Second,  from  an  execution  time  standpoint,  the 
\og\7l\  factor  represents  an  additional  complexity  with  respect  to  the  traditional  approach 
where  the  iteration  matrix  is  stored  explicitly. 

In  the  next  sections,  we  show  how  the  memory  requirements  can  be  reduced,  while 
improving  the  execution  complexity  at  the  same  time. 


5  A  multilevel  search  tree  to  store  IZ 

We  extend  work  by  Chiola,  who  defined  a  multilevel  technique  to  store  the  reachable  markings 
of  a  SPN  [4].  Since  the  state,  or  marking,  of  a  SPN  with  K  places  can  be  represented  as  a 
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Figure  6:  Chiola’s  storage  scheme  to  store  the  reachability  set  of  a  SPN. 

fixed-size  vector  i  =  •  •  • ,  i/c-i),  he  proposed  the  strategy  illustrated  in  Fig.  6,  copied 

from  [4].  In  the  figure,  it  is  assumed  that  the  SPN  has  three  places,  and  that  there  are  five 
reachable  markings,  a  through  e.  A  three-level  tree  is  then  used.  The  first  level  discriminates 
markings  on  their  first  component  (either  0  or  1);  the  second  level  discriminates  on  their 
second  component  (also  either  0  or  1)  within  a  given  first  component;  finally,  the  third  level 
fully  determines  the  marking  using  the  third  component  (1,  2,  or  4). 

While  it  is  sometimes  possible  to  apply  the  Kronecker  approach  where  “submodels”  are 
individual  places  of  the  SPN  [9],  this  is  seldom  desirable.  Hence,  we  will  use  a  multilevel 
approach  with  K  levels,  one  for  each  submodel  (a  sub-GSPN),  as  shown  in  Fig.  7.  Before 
explaining  this  data  structure,  we  need  to  define  the  following  sets: 

•  The  set  of  reachable  substates  up  to  b. 

nl  =  {(io, . . . , ifc)  G  •  •x7e'= :  ^iu+u •  -  • , e  x- •  .x7^'"-^  (io, . . . iK-i)  e  7^}. 

Clearly,  71q~^  coincides  with  1Z  itself,  while  "Rq  coincides  with  TZ^  iff  every  element  of 

the  local  state  space  0  can  actually  occur  in  some  global  state,  and  so  on. 

•  The  set  of  reachable  local  states  in  TZ^  conditioned  on  a  reachable  substate  (io, . . . ,  ifc-i)  G 

Rt"- 

7e'=(io, . . . ,  =  {ife  G  :  (io,  - . . ,  U)  G  Tig}. 

Clearly,  this  set,  when  defined  (i.e.,  when  indeed  (io, . . . ,  ifc-i)  G  Ro~^),  is  never  empty. 

In  Fig.  7,  level  k  G  {0, ...,A  —  1}  contains  |Tlo~^|  search  trees,  each  one  storing 
Tl*^(io, . . . ,ife_i),  for  a  different  reachable  substate  (io, . . . , ifc_i)  G  Ro~^.  Thus,  a  generic 
tree  at  level  k  <  K  —  1  has  the  structure  shown  in  Fig.  8.  In  each  one  of  its  nodes,  the 
pointer  points  to  a  tree  at  level  k  +  1.  In  the  drawing,  the  local  states  for  level  k  are 
stored  in  an  unsorted  dynamically  extensible  array,  in  the  order  in  which  they  are  found, 
and  pointed  by  the  pointer  in  each  node.  Alternatively,  it  is  possible  to  store  a  local 
state  directly  in  the  node.  If  the  submodel  is  quite  complex,  this  second  alternative  might  be 
not  as  memory  effective,  since  a  local  state  can  require  several  bytes  for  its  encoding,  which 
are  then  repeated  in  each  node,  instead  of  just  a  pointer.  Furthermore,  if  we  can  assume  an 
upper  bound  on  the  number  of  local  states  for  a  given  submodel  (e.g.,  2^®),  we  can  store  an 
index  (16  bits)  into  the  local  state  array,  instead  of  a  pointer  (usually  32  bits). 
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Figure  7:  A  multilevel  storage  scheme  for  71. 


Figure  8:  A  tree  at  level  k  <K  -I,  pointing  to  trees  at  level  k  +  1. 


The  total  number  of  nodes  used  for  trees  at  level  k  equals  the  number  of  reachable 
substates  up  to  that  level,  177§|,  hence  the  total  number  of  tree  nodes  for  our  multilevel  data 
structure  is 

E  =  1^1 

k=0 

where  the  last  expression  is  the  number  of  nodes  required  in  the  standard  single-level  ap¬ 
proach  of  Fig.  4.  Two  observations  are  in  order: 


•  First,  the  total  number  of  nodes  at  levels  0  through  ii'  —  2  is  a  small  fraction  of  the 
nodes  at  level  K  —  1,  that  is,  of  \7l\,  provided  the  trees  at  the  last  level,  K  —  1,  are 
not  “too  small” .  This  is  ensured  by  an  appropriate  partitioning  of  a  large  model  into 
submodels.  In  other  words,  only  the  memory  used  by  the  nodes  of  the  trees  at  the  last 
level  really  matters. 

•  Second,  the  nodes  of  the  trees  at  the  last  level  do  not  require  a  pointer  to  a  tree  at  a 

lower  level,  hence  each  node  requires  only  three  pointers  (96  bits).  Again,  if  we  can 
assume  a  bound  on  the  number  of  local  states  for  the  last  submodel,  <  2**,  we 

can  in  principle  implement  dynamic  data  structures  that  require  only  36  bits  per  node, 
plus  some  overhead  to  implement  dynamic  arrays.  Given  a  model,  there  is  usually  some 
freedom  in  choosing  the  number  of  submodels,  that  is,  the  number  of  levels  K.  We 
might  then  be  able  to  define  the  last  submodel  in  such  a  way  that  is  has  at  most  2*  local 
states,  resulting  in  only  slightly  more  than  3\TZ\  bytes  to  store  the  entire  reachability 
set.  When  this  is  not  the  case,  the  next  easily  implementable  step,  2^®,  is  normally 
more  than  enough.  Even  in  this  case,  the  total  memory  requirements  are  still  much 
better  than  with  a  single-level  implementation. 

We  stress  that,  while  this  discussion  about  using  8  or  16  bit  indices  seems  to  be  excessively 
implementation-dependent,  it  is  not.  Only  through  the  multilevel  approach  it  is  possible  to 
exploit  the  small  size  of  each  local  space.  In  the  single-level  approach,  instead,  the  total 
number  of  states  that  can  be  indexed  (pointed  by  a  pointer)  is  {It],  a  number  which  can 
be  easily  be  10^,  and  possibly  more.  In  this  case,  we  are  forced  to  use  32-bit  pointers,  and 
we  can  even  foresee  a  point  in  the  not-so-distant  future  where  even  these  pointers  will  be 
insufficient,  while  the  multilevel  approach  can  be  implemented  using  8  or  16  bit  indices  at 
the  last  level  regardless  of  the  size  of  the  reachability  set,  as  long  as  is  not  too  large. 

Just  as  with  a  single  search  tree,  we  can  implement  our  multilevel  scheme  using  splay 
or  AVL  trees.  However,  the  choice  between  storing  K  and  U  separately  or  as  a  single  set 
has  subtler  implications.  If  we  choose  to  store  them  as  a  single  set,  distinguishing  between 
explored  and  unexplored  states  or,  more  precisely,  being  able  to  access  the  unexplored  states 
only,  still  requires  one  of  the  approaches  discussed  for  the  single- level  case  (Fig.  4).  Since 
only  local  states  are  stored  for  each  level,  without  repetition,  we  can  no  longer  use  the  state 
array  approach  of  Fig.  4(c).  The  other  two  methods,  which  involve  maintaining  a  list  of 
unexplored  states,  would  only  allow  us  to  access  nodes  in  trees  at  level  K  —  1,  which  are 
meaningless  if  we  do  not  know  the  entire  path  from  level  0.  In  other  words,  we  would  only 
know  the  last  component  of  each  unexplored  state,  but  not  the  first  K  —  1  components.  To 
obtain  all  the  components,  we  must  add  a  backward  pointer  from  each  node  (conceptually, 
from  each  tree,  since  all  the  nodes  in  a  given  tree  have  the  same  previous  components)  to  a 
node  in  the  previous  level,  and  so  on.  This  additional  memory  overhead  is  substantial. 

With  the  single-level  approach,  then,  we  store  HUU  in  a  single  tree,  using  the  state  array 
described  in  Fig.  4(c).  With  the  multilevel  approach,  instead,  we  store  TZ  and  U  separately. 
In  this  second  case,  states  are  truly  deleted  from  U.  However,  we  are  free  to  remove  states 
in  any  order  we  choose.  When  using  AVL  trees,  we  use  the  balance  information  to  choose, 
at  each  level,  a  state  in  a  leaf  of  the  tree  in  such  a  way  that  no  rebalancing  is  ever  required. 
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Figure  9:  Memory  usage  (bytes)  for  the  single  and  multilevel  approaches. 

Fig.  9  shows  the  number  of  bytes  used  to  store  %  and  U,  separately,  for  our  three  models, 
using  the  single  vs.  the  multilevel  approach.  The  dashed  boxes  in  Fig.  1  and  2  indicate  how 
the  models  have  been  decomposed  into  submodels,  in  all  cases  K  =  A. 

The  multilevel  approach  is  clearly  preferable,  especially  for  large  models.  Indeed,  we 
could  not  generate  the  state  space  using  the  single-level  approach  for  iV  =  6  for  the  kanban- 
timed  model,  while  we  could  using  the  multilevel  approach  (all  our  experiments  where  run  on 
a  Sun-clone  with  a  55  MhZ  HyperSparc  processor  and  128  Mbyte  of  RAM,  without  making 
use  of  virtual  memory). 

6  Execution  time 

Several  factors  need  to  be  considered  when  studying  the  impact  of  our  approach  on  the 
execution  time.  First,  we  explored  possible  differences  due  to  using  splay  trees  vs.  AVL 
trees.  Fig.  10  shows  that  no  method  is  clearly  better  and  that,  in  most  cases,  the  differences 
are  minor  in  comparison  to  the  effect  due  to  the  choice  of  a  single  vs.  a  multilevel  data 
structure. 

We  then  focussed  on  this  second  factor,  using  AVL  trees  exclusively.  The  advantages  of 
the  multilevel  approach  are: 

•  Consider  first  the  case  when  BuildRS  searches  for  a  state  i  not  yet  in  the  current  71  (the 
same  discussion  holds  for  U).  With  a  single-level  tree,  the  search  will  stop  only  after 
reaching  a  leaf,  hence  0(log|7?.l)  comparisons  are  always  performed.  In  the  multilevel 
approach,  instead,  the  search  stops  as  soon  as  the  substate  (ii, . . . ,  i^)  is  not  reachable, 
for  some  k  <  K  -1.  If  all  the  trees  at  a  given  level  are  of  similar  size,  this  will  require 
at  most  0(log|7?.§|)  comparisons.  In  practice,  the  situation  is  even  more  favorable, 
because,  for  any  level  I  <  k,  the  tree  searched,  which  stores  7?-^(ii, . .  •  ,h-i),  contains 
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Figure  10:  Execution  times  (seconds)  for  the  AVL  and  splay  trees. 


the  substate  i;  we  are  looking  for,  so  the  jump  from  level  I  to  level  I  -H  1  might  occur 
before  reaching  a  leaf. 

•  If  i  is  already  in  'R.,  instead,  the  search  on  the  single-level  tree  will  sometimes  find  i 
before  reaching  a  leaf,  but  this  is  true  also  at  each  level  of  the  multilevel  approach,  just 
as  in  the  previous  case,  for  the  levels  I  <  k. 

•  Perhaps  even  more  important,  regardless  of  whether  i  is  already  in  the  tree  or  not, 
is  the  complexity  of  each  comparison.  With  the  multilevel  approach,  only  substates 
are  compared,  and  these  are  normally  small  data  structures.  For  GSPNs,  this  could 
be  an  array  of  8  or  16  bit  integers,  one  for  each  place  in  the  sub-GSPN  (memory 
usage  is  not  an  issue,  since  each  local  state  for  level  k  is  stored  only  once).  On  the 
other  hand,  with  the  single-level  approach,  entire  states  are  compared.  If  the  states 
are  stored  as  arrays,  the  comparison  can  stop  as  soon  as  one  component  differs  in 
the  two  arrays.  However,  as  the  search  progresses  down  the  tree,  we  compare  states 
that  are  increasingly  close  in  lexical  order,  hence  likely  to  have  the  same  first  few 
components;  this  implies  that  comparisons  further  away  from  the  root  tend  to  be  more 
expensive.  Furthermore,  storing  states  as  full  arrays  is  seldom  advisable.  For  example, 
in  SMART,  a  sophisticated  packing  is  used,  on  a  state-by-state  base,  to  reduce  the 
memory  requirements.  In  this  case,  the  state  in  the  node  needs  to  be  “unpacked” 
before  comparing  it  with  i,  thus  effectively  examining  every  component  of  the  state. 


Section  7  further  investigates  the  complexity  of  searching  a  state,  reachable  or  not,  after 
the  entire  reachability  set  has  been  explored. 

6.1  Exploiting  event  locality 

Our  multilevel  search  tree  has  an  additional  advantage  which  can  be  exploited  to  further 
speed  up  execution.  To  illustrate  the  technique,  we  define  two  functions; 

•  LocalSet{k,Ik-i),  which,  given  a  submodel  index  k,  0  <  k  <  K,  and  a  pointer  4-i  to 

a  reachable  substate  (io, . . . ,  i^-i)  €  returns  a  pointer  to  the  tree  containing  the 

set  72.''(io,...,ifc-i)- 

•  LocalIndex{k,  h-i,  h),  which,  given  a  submodel  index  k,  0  <  k  <  K,  a,  pointer  h-i  to 
a  a  reachable  substate  (io, .  •  • ,  ifc-i)  e  1lo~\  and  a  local  state  index  h  €  returns 
the  pointer  4  to  substate  (io, . . . ,  U),  if  reachable,  null  otherwise. 

Given  our  data  structure,  “a  pointer  4  to  a  reachable  substate  (io,  • . . ,  ifc)  €  TIq”  points  to 
the  node  corresponding  to  the  local  state  i*,  in  the  tree  containing  the  set  ■R*(io,  •  •  • ,  ifc-i)- 
To  find  whether  state  i  is  in  (the  current)  77,  we  can  then  use  a  sequence  of  K  function 
calls 

LocalIndex(K-l,  LocalIndex{K  -  2, . . .  LocalIndex{l,  LocalIndex{0,  null,  io),  ii),  •  •  •  iK-2),  Ik-i)- 
>  . .  . *  . .  ' 

Ik-2 

The  overall  cost  of  these  K  calls  is  exactly  what  we  just  discussed  when  comparing  the 
single  and  multilevel  approaches.  However,  if  i  is  reachable  and  we  now  wanted  to  find  the 
index  of  a  state  i'  differing  from  i  only  in  its  last  position,  we  could  do  so  with  a  single 
call  LocalIndex{K  -  1,  Ik -2,  i'i<:_i),  provided  we  have  saved  the  value  Ik -2,  the  index  of  the 
reachable  substate  (io,  •  •  • ,  iir-2)-  A  similar  argument  applies  to  other  values  of  k.  Only  for 
it  =  0  we  need  to  perform  an  entirely  new  search. 

We  can  then  exploit  this  “locality”  by  considering  the  possible  effects  of  the  events  on 
the  state.  Given  =  77°  x  •  •  •  77^"\  we  can  partition  S  into  5°, . . . ,  such  that 

e  eS'^  (Vi  €  a,  Active{e,  i)  =  True  A  j  =  New{e,  i)  =>  V/,  0  <  i  <  fc,  i(  =  j/)  , 

that  is,  events  in  can  only  change  local  states  k  or  higher.  For  example,  for  GSPNs, 
this  is  achieved  by  assigning  to  all  the  transitions  whose  enabling  and  firing  effect  on 
the  state  depends  only  on  places  in  sub-GSPN  k,  and  possibly  in  sub-GSPNs  A:  +  1  through 
4  -  1,  but  not  in  sub-GSPNs  1  through  k-1.  This  can  be  easily  accomplished  through  an 
automatic  inspection  even  in  the  presence  of  guards  and  inhibitor  and  variable  cardinality 
arcs  (although  in  this  case  the  partition  might  be  overly  conservative,  that  is,  it  might  assign 
a  transition  to  a  level  k  when  it  could  have  been  assigned  to  a  level  I  >  k). 

When  exploring  state  i,  in  procedure  BuildRS,  we  remove  i  from  U  and  insert  it  into  77, 
obtaining,  as  a  byproduct,  a  sequence  of  pointers  /_i  =  null,  4, .  •  • ,  Ik-2,  to  the  reachable 
substates  (4),  •  •  - ,  (4, •  •  ■,iK-2)-  Then,  we  can  examine  the  events  in  S  in  any  order.  As 
soon  as  we  find  an  enabled  event  e  e  5*,  we  generate  i'  =  New{e,  i),  and  we  are  guaranteed 
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Figure  11;  Execution  times  (seconds)  for  the  single  and  multilevel  approach. 

that  its  components  0  through  k  —  1  are  identical  to  those  of  i.  Hence,  we  can  search  for  it 
starting  at  Ik-i,  that  is,  we  only  need  to  perform  the  K  —  k  calls 

LocalIndex{K  —  1, . . .  LocalIndex{k  -t- 1,  LocalIndex{k,  Ik-i,ik),  ife+i),  -  •  ■  j  Ik’-i)- 

The  savings  due  to  these  accelerated  searches  depend  on  the  particular  model  and  the  parti¬ 
tion  chosen  for  it.  We  show  the  results  for  our  three  models  in  Fig.  11.  In  this  case,  savings 
of  20%  execution  time  or  more  are  achieved.  Except  for  the  need  to  specify  a  partition, 
which  needs  to  be  done  anyway  to  use  our  multilevel  approach,  the  technique  is  completely 
automatic,  and  has  no  additional  memory  requirements. 

6.2  Application  to  solutions  based  on  Kronecker  operators 

We  can  also  apply  our  multilevel  approach  and  the  principle  of  locality  just  introduced 
to  speed  up  procedure  VectMatr Multiply,  described  in  Section  4.  An  improved  version, 
BtrVectMatrMultiply  of  the  same  procedure  is  shown  in  Fig.  12.  Its  complexity  is 


C»(^((A")7io,7jo)  •  log  no  +  7?((A°  ®  •  logni 

+  •  •  •  r?((A°  (8)  •  •  •  0  •  logn^-i) 

=  •  logn^-i). 

Indeed,  the  term  “logn^t”  for  the  searches  at  level  k  is  an  upper  bound,  since  the  size  of 
the  tree  searched  by  Localindex  is  smaller  than  Uk  if  there  are  any  unreachable  states;  the 
“average”  size  of  a  tree  at  level  k  is  I'^ol/I^o"^!  ^ 
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BtrVectMatrMultiply{m:  x,  K,  A®, A\..., inout:  y) 

1. 

for  each  io  €  LocalSet{0,  null)  do 

2. 

7o  ^  LocalIndex{0,  nu\\,io)-, 

3. 

for  each  jo  s.t.  Af^j^  >  0  do 

4. 

Jo  <-  LocalIndesiO,  null,  jo): 

5. 

if  Jq  ^  null  then 

6. 

7. 

for  each  ii  €  LocalSet{l,  Io)  do 

8. 

Ii  ^  LocalIndex{l,  Io,h)] 

9. 

for  each  ji  s.t.  >  0  do 

10. 

Ji  4-  Loca/Index(l,  Jo,  ji); 

11. 

if  Ji  ^  null  then 

12. 

Oi  4—  Co  •  A^jji 

13. 

for  each  €  LocalSet{K  —  1,  Ik-2)  do 

14. 

Ik-\  LocalIndex{K  —  1,  Ik-27k-i)> 

15. 

for  each  jk-i  s.t.  A,^"]  >  0  do 

16. 

Jk-i  4-  LocalIndexiK  —  1,  Jk-2,}k-i)'< 

17. 

if  Jk-\  7^  null  then 

18. 

ttK-l  4-  aK-2  ■ 

19. 

YJk-i  ^  yjK-1  +  ^//c-1  ■  ^K-l< 

Figure  12:  Improved  procedure  to  multiply  a  vector  by  a  submatrix  of  a  Kronecker  product. 


7  Compressing  71  after  exploration 

So  far  we  have  discussed  techniques  to  store  the  reachability  set  while  it  is  being  built  by 
procedure  BuildRS.  After  this  phase,  though,  dynamic  data  structures  such  as  search  trees 
are  unnecessary;  a  much  more  memory-efficient  data  structure  can  be  used.  For  superposed 
GSPNs,  this  was  proposed  by  Kemper  [13],  who  uses  an  integer  array  of  size  \n\.  The  I-th 
position  of  this  array  stores  the  J-th  marking  in  lexical  order,  ’^"^(/).  Using  this  array,  a 
simple  binary  search  can  be  used  to  compute  ^^(i)  for  a  given  i  €  It,  where  ^(i)  =  null  if 
An  underlying  assumption  in  [13]  is  that  each  marking  can  be  encoded  into  a  32-bit 
integer,  and  this  might  not  be  easy  for  very  large  models. 

Our  multilevel  approach  can  help  here  as  well.  We  can  store  the  nodes  of  a  tree  at  level 
k  <  K  —  1  as  an  ordered  array  of  pairs  of  pointers  (see  Fig.  13).  The  first  one  points  to  the 
local  state  for  level  k  (alternatively,  we  could  store  either  an  index  into  an  array  containing 
the  elements  of  71*  in  any  order,  or  the  local  state  itself);  the  lexical  position  of  the  particular 
local  state  determines  the  array  order.  The  second  pointer  points  to  the  array  used  to  store 
the  tree  at  the  next  level.  At  the  last  level,  we  do  not  need  to  store  a  pointer  to  a  lower-level 
tree,  hence  only  one  pointer  or,  better,  one  index  is  required  for  each  node.  This  is  very 
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Figure  13:  Multilevel  storage  after  the  reachability  set  has  been  built. 

desirable,  since,  as  discussed  before,  only  the  last  level  has  a  major  effect  on  the  memory 
requirements.  Using  8  or  16  bit  indices,  we  can  then  store  TZ  using  only  slightly  more  than 
jT^I  or  2\R\  bytes,  respectively. 

The  execution  time  required  to  transform  our  tree-based  multilevel  data  structure  into 
the  array-based  one  is  negligible,  around  1/1000  of  the  time  used  by  BuildRS.  Hence,  this 
transformation  is  always  advisable  if  searches  are  to  be  performed  on  TZ.  This  is  the  case, 
for  example,  when  computing  ^(i)  for  some  i  €  7t,  one  of  the  essential  operations  in  the 
Kronecker  approach  we  discussed.  To  further  investigate  the  impact  of  our  multilevel  data 
structure,  we  computed  the  expected  time  required  to  search  for  a  reachable  state  i  ^  IZ 
(i.e.,  to  compute  ^(i)  when  i  e  TZ),  and  for  a  non-reachable  state  (i.e.,  to  compute  ^f(i) 
when  it  is  null)  in  the  kanban-timed  model.  Fig.  14  shows  the  results  for  the  single  vs.  the 
multilevel  approach.  The  dramatic  difference  in  the  slope  of  the  curves  for  the  single  and 
multilevel  approaches  further  confirms  that  our  results  will  greatly  improve  the  efficiency  of 
solution  methods  based  on  Kronecker  operators. 

Also,  note  how  the  curves  for  i  G  7?.  and  i  ^TZ  intersect,  in  both  cases.  We  generated 
unreachable  states  by  determining  the  bound  bp  on  the  number  of  tokens  in  each  place  p, 
then  randomly  choosing  a  number  Up  of  tokens  between  0  and  bp  for  p,  independently  of  the 
other  places.  The  resulting  marking  was  (almost)  always  unreachable.  For  small  reachability 
sets,  searching  for  a  reachable  state  is  faster  than  searching  an  unreachable  state,  because  we 
can  sometimes  stop  the  search  before  reaching  a  leaf.  However,  as  the  state  space  grows,  it  is 
increasingly  likely  that,  when  searching  for  an  unreachable  state,  the  comparison  between  two 
states  (or  substates,  for  the  multilevel  approach)  stops  after  comparing  just  a  few  places. 
Clearly,  this  effect  more  than  oflfeets  the  need  to  explore  up  to  a  leaf  for  the  single-level 
approach  (in  the  multilevel  approach,  an  additional  advantage  is  that,  for  an  unreachable 
state,  the  search  might  stop  at  a  level  k  <  K  —  1). 

8  Conclusion 

We  have  presented  a  detailed  analysis  of  a  multilevel  data  structure  for  storing  and  searching 
the  large  set  of  states  reachable  in  some  high-level  model.  Memory  usage,  which  is  normally 
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Search  time  for  a  single  state  after  compression  (kanban-timed) 


Figure  14;  Expected  search  times  (microseconds)  for  the  single  and  multilevel  approach. 

the  main  concern,  is  greatly  reduced.  At  the  same  time,  this  reduction  is  achieved  not  at 
the  expense  of,  but  in  conjunction  with,  execution  efficiency. 
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A  further  technique  based  on  event  locality  achieves  an  additional  reduction  in  the  exe¬ 
cution  time,  with  no  additional  memory  cost. 

The  results  substantially  increase  the  size  of  the  reachability  sets  that  can  be  managed, 
and  they  can  be  particularly  effective  for  the  Kronecker-based  approaches  that  have  recently 
been  proposed. 

In  the  future,  we  will  investigate  how  to  use  our  approach  in  a  distributed  fashion,  as  done 
in  [6],  where  N  cooperating  processes  explore  different  portions  of  the  reachability  set.  In 
particular,  we  plan  to  apply  the  data  structure  we  presented  and  to  explore  ways  to  balance 
the  load  among  the  cooperating  processes  automatically. 
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